Critical Care Explorations — Latest Matching Preprints

1

State-Dependent Parameter Relevance in Intensive Care: Syndrome-Specific Centroids Improve Orbit-Based Mortality Prediction from AUC 0.59 to 0.83 in 59,362 Predictions

Basilakis, A.; Duenser, M. W.

2026-04-08 intensive care and critical care medicine 10.64898/2026.04.05.26350216 medRxiv

Top 0.1%

37.6%

Show abstract

Background: The Therapeutic Distance framework (Paper 1) achieved AUC 0.61 for orbit-based mortality prediction in 11,627 sepsis patients. We hypothesised that incorporating state-dependent parameter relevance would substantially improve prediction. Methods: We extended the framework to 84,176 ICU patients from MIMIC-IV v3.1 across 16 clinical syndromes. Validation included full-population leave-one-out (n=59,362), head-to-head comparison against SAPS-II and logistic regression on 34,467 matched patients with bootstrap confidence intervals, temporal validation, outcome permutation, sensitivity analysis, and calibration assessment. Results: Full-population leave-one-out achieved AUC 0.832 (n=59,362). On 34,467 matched patients, Therapeutic Distance (AUC 0.841) significantly outperformed both SAPS-II (0.786; delta=+0.055, 95% CI +0.048 to +0.061, p<0.001) and logistic regression (0.788). Temporal validation showed stable performance (delta=-0.006). Outcome permutation confirmed genuine signal (AUC 0.859 to 0.498 with shuffled mortality). Sensitivity analysis demonstrated near-zero variation (delta 0.0006-0.003). The framework performed well for 8 of 16 syndromes (AUC >0.70) and failed for DKA and post-cardiac surgery (AUC <0.40). Conclusions: Therapeutic Distance provides therapy-specific risk stratification that exceeds both established severity scores and standard machine learning while remaining robust to hyperparameter choices, temporal drift, and outcome permutation.

2

The Peripheral Use of Low-dose Vasopressors for Safety and Efficacy (PULSE) in the intensive care unit: a prospective, unblinded feasibility study protocol

Wiseman, J.; Sibley, S.; Perez-Patrigeon, S.; Mekhaeil, M.; Hanley, M.; Hunt, M.; Boyd, T.; Grant, B.; Boyd, J. G.

2026-04-20 intensive care and critical care medicine 10.64898/2026.04.13.26349750 medRxiv

Top 0.1%

22.7%

Show abstract

IntroductionThere is increasing interest in the peripheral administration of vasopressors for two main reasons: (1) to expedite vasopressor initiation in patients with refractory shock and (2) to avoid the potential complications associated with central venous catheter placement. The current evidence on the use of peripheral vasopressor administration is primarily based on single-center observational studies. There are inconsistencies in the administration of peripheral vasopressors, including catheter gauge and location, monitoring practices, vasopressor concentrations, and duration of use. This has made it difficult for institutions to develop best practice guidelines. A randomized controlled trial is needed to address this knowledge gap. Methods and analysisThe Peripheral Use of Low-dose Vasopressors for Safety and Efficacy (PULSE) in the intensive care unit is a prospective, unblinded feasibility study. Eligible patients will be 18 years or older, have no existing central venous catheter or peripherally inserted central catheter and have the presence of shock requiring a minimum vasopressor dose of any of the following: norepinephrine 0.0625 mcg/kg/min, phenylephrine 0.625 mcg/kg/min, and epinephrine 0.0625 mcg/kg/min. Fifty patients will be randomized 1:1 into either the peripheral venous catheter or central venous catheter group. The primary outcome is feasibility, defined as (1) a recruitment rate of 4 participants per month, (2) a data capture rate of [≥]90%, and (3) a <50% conversion rate from peripheral to central access. The secondary outcomes include the safety of peripheral vasopressor use, alive and central-line-free days, the number of attempts needed to place a catheter, volume status, in-hospital mortality rate, ICU and hospital length of stay, and patient-centred important outcomes. ImplicationsThe data collected from this study will inform the design of a definitive randomized controlled trial to assess the safety and efficacy of protocol-driven peripheral vasopressor administration. Ethics and disseminationThis study received approval (6042888) from the Queens University Health Sciences/Affiliated Teaching Hospitals Research Ethics Boards. Results of this study will be presented at critical care conferences and submitted for publication. Trial registration numberNCT06920173 (https://clinicaltrials.gov/study/NCT06920173).

3

A grading system of dynamic fibrinolysis resistance in sepsis associates with ICU outcomes

Coupland, L. A.; Frost, S. A.; Lin, J.; Pham, N.; Suryana, E.; Self, M.; Chia, J.; Lam, T.; Liu, Z.; Jaich, R.; Crispin, P.; Rabbolini, D.; Law, R.; Keragala, C.; Medcalf, R.; Aneman, A.

2026-03-27 intensive care and critical care medicine 10.64898/2026.03.25.26349336 medRxiv

Top 0.1%

22.2%

Show abstract

Rationale: Fibrinolysis resistance in sepsis associates with thrombotic burden, multi-organ failure and death. The degrees and dynamics of resistance that associate with mortality in acute sepsis are unknown, and a simple tool to aid clinician interpretation of fibrinolysis measurements is lacking. Objectives: To establish a point of care grading tool of fibrinolysis resistance that aligns with scoring systems for disease acuity, is substantiated by plasma fibrinolysis markers and enables rapid investigation of the fibrinolysis state at the point of care. Methods: Prospective observational study of 116 adult sepsis/septic shock patients with sequential measurements of fibrinolysis resistance during Intensive Care Unit (ICU) admission using tissue plasminogen activator (tPA) enhanced viscoelastic testing (VET). The clot lysis time (TPA-LT) adjusted for fibrin clot amplitude (TPA-LT/FIBA10, sec/mm) underwent cluster analysis and was evaluated against disease severity scores, standard pathology, clinical outcomes and fibrinolysis markers. Measurements and Main Results: Three clusters of progressively increasing fibrinolysis resistance were identified (Grades 1-3). At admission, Grade 3 associated with the highest disease severity, organ failure, haematological and biochemical perturbations, fibrinolysis marker inhibitory profile and mortality (42% versus 24% and 15% in Grade 2 and Grade 1, respectively) with a 3.9-fold [95% CI 1.4-11] increased hazard ratio for death at 28 days compared to Grade 1. Transitions between grades were frequent over 7 days with a reduced Grade associated with decreased risk of death. Conclusions: Grading of fibrinolysis resistance in sepsis enables rapid identification of patients at greatest mortality risk with any dynamic improvement corresponding to favourable clinical outcomes.

4

Physiological subphenotypes of ARDS: Prognostic and predictive enrichment for PEEP strategy

Meza-Fuentes, G.; Delgado, I.; Barbe, M.; Sanchez-Barraza, I.; Filippini, D.; Smit, M. R.; Sinnige, J. S.; Kramer, L.; Smit, J.; Jonkman, A.; Meade, M.; Retamal, M. A.; Lopez, R.; Bos, L. D. J.

2026-03-30 intensive care and critical care medicine 10.64898/2026.03.27.26349397 medRxiv

Top 0.1%

21.9%

Show abstract

Background Acute respiratory distress syndrome (ARDS) is characterised by substantial physiological heterogeneity, which contribute to a very variable clinical outcomes and therefore inconsistent responses to ventilatory strategies. We aimed to externally validate physiological ARDS subphenotypes previously identified using routine ventilatory and gas-exchange variables, assess their prognostic relevance across independent cohorts, and examine heterogeneity of treatment effect according to PEEP strategy. Methods Unsupervised Gaussian Mixture Modelling was used to identify physiological subphenotypes based on ventilatory mechanics and gas-exchange parameters. Labels were subsequently used to train and validate supervised classifiers using XGBoost. Prognostic relevance was assessed across three independent cohorts, including two randomised controlled trials (ALVEOLI and LOVS). Predictive enrichment for PEEP strategy was evaluated using individual patient data from ALVEOLI and LOVS (n = 1,532) using intention-to-treat analyses, applying both one-stage and two-stage fixed-effects IPD meta-analytic approaches to test for interaction between physiological subphenotype and PEEP strategy. Results Two distinct physiological subphenotypes, termed Efficient and Restrictive, were replicated across independent cohorts. Across each cohort, patients classified as Restrictive consistently exhibited higher all-cause 28-day mortality compared to Efficient patients. When pooled across studies, the Restrictive subphenotype was associated with a significantly increased risk of death (pooled odds ratio 1.75, 95% CI 1.36-2.24), with no evidence of between-study heterogeneity. Predictive analyses showed a statistically significant interaction between physiological subphenotype and PEEP strategy in the one-stage IPD model (p for interaction = 0.037), with concordant findings in the two-stage fixed-effects IPD meta-analysis (interaction OR 1.91, 95% CI 1.00-3.66; I2 = 0%). Higher PEEP was associated with increased mortality in Efficient patients and reduced mortality in Restrictive patients, indicating effect modification by physiological subphenotype. Interpretation Physiological ARDS subphenotypes derived from routinely collected bedside data provide robust and externally validated prognostic stratification across observational and randomised trial cohorts. The observed interaction with PEEP strategy suggests that underlying physiological profiles may influence treatment response, supporting the concept that physiology-based be a starting point for personalized medicine and therefore better ventilatory strategies in future clinical trials.

5

Therapeutic Distance: An Orbit-Based Framework for ICU Decision Support - Initial Validation in 11,627 Sepsis Patients from MIMIC-IV

Basilakis, A.

2026-04-04 intensive care and critical care medicine 10.64898/2026.04.02.26350049 medRxiv

Top 0.1%

21.8%

Show abstract

Background: Patient matching in intensive care databases yields sample sizes too small for individualised outcome analysis. Current AI systems provide population-level guideline summaries but omit stratification variables that may invert therapy signals at the individual level. Methods: We developed the Therapeutic Distance framework, which computes the z-standardised distance between a patient's clinical parameters and the centroid of MIMIC-IV patients who received each therapy: d(P,T) = sum of wi(T) x |(Li - mui(T)) / sigmai|. We hypothesise that patients at the same distance to a therapy (same orbit) have comparable outcomes. Six validation experiments were performed on 11,627 sepsis patients (SAPS-II 30-80) from MIMIC-IV v3.1. Results: Echo-stratified vasopressin recipients showed mortality of 30.1% (n=146, 95% CI 22.6-37.7%) versus 53.9% without echo (n=2,426, 95% CI 51.9-55.9%). Confidence intervals did not overlap (bootstrap, 1,000 resamples). However, echo-stratified patients had lower general severity (SAPS-II 49.2 vs 53.9) but higher cardiac biomarkers (troponin 1.0 vs 0.51 ng/mL), indicating that the observed difference is compatible with both severity confounding and a possible cardiac-specific vasopressin effect. Leave-one-out prediction with uniform weights achieved AUC 0.61 as a structural baseline. Conclusions: Therapeutic Distance replaces patient matching with orbit matching, substantially increasing usable sample sizes. The echo-vasopressin finding is hypothesis-generating and mechanistically plausible but not causally proven. The framework is intended as a clinical decision support signal under uncertainty, not as a causal inference method.

6

Comparing prognostic performance and reasoning between large language models and physicians

Gjertsen, M.; Yoon, W.; Afshar, M.; Temte, B.; Leding, B.; Halliday, S.; Bradley, K.; Kim, J.; Mitchell, J.; Sanders, A. K.; Croxford, E. L.; Caskey, J.; Churpek, M. M.; Mayampurath, A.; Gao, Y.; Miller, T.; Kruser, J. M.

2026-04-25 intensive care and critical care medicine 10.64898/2026.04.17.26350898 medRxiv

Top 0.1%

14.1%

Show abstract

Importance: Physicians routinely prognosticate to guide care delivery and shared decision making, particularly when caring for patients with critical illnesses. Yet, these physician estimates are prone to inaccuracy and uncertainty. Artificial intelligence, including large language models (LLMs), show promise in supporting or improving this prognostication. However, the performance of contemporary LLMs in prognosticating for the heterogeneous population of critically ill patients remains poorly understood. Objective: To characterize and compare the performance of LLMs and physicians when predicting 6-month mortality for hospitalized adults who survived critical illness. Design: Embedded mixed methods study with elicitation and comparison of prognostic estimates and reasoning from LLMs and practicing physicians. Setting: The publicly available, deidentified Medical Information Mart for Intensive Care (MIMIC)-IV v2.2 dataset. Participants: We randomly selected 100 hospitalizations of adult survivors of critical illness. Four contemporary LLMs (Open AI GPT-4o, o3- and o4-mini, and DeepSeek-R1) and 7 physicians provided independent prognostic estimates for each case (1,100 total estimates; 400 LLM and 700 physician). Main outcomes and measures: For each case, LLMs and physicians used the hospital discharge summary and demographics to predict 6-month mortality (yes/no) and provide their reasoning (free text). We assessed prognostic performance using accuracy, sensitivity, and specificity, and used inductive, qualitative content analysis to characterize reasonings. Results: Mean physician accuracy for predicting mortality was 70.1% (95% CI 63.7-76.4%), with sensitivity of 59.7% (95% CI 50.6-68.8%) and specificity of 80.6% (95% CI 71.7-88.2%). The top-performing LLM (OpenAI o4-mini) accuracy was 78.0% (95% CI 70.0-86.0%), with sensitivity of 80.0% (95% CI 67.4-90.2%) and specificity of 76.0% (95% CI 63.3-88.0%). The difference between mean physician and top-performing LLM accuracy was not statistically significant (p = 0.5). Qualitative analysis revealed similar patterns in LLM and physician expressed reasoning, except that physicians regularly and explicitly reported uncertainty while LLMs did not. Conclusion and Relevance: In this study, LLMs and physicians achieved comparable, moderate performance in predicting 6-month mortality after critical illness, with similar patterns in expressed reasoning. Our findings suggest LLMs could be used to support prognostication in clinical practice but also raise safety concerns due to the lack of LLM uncertainty expression.

7

Clinician-Informed Feature Engineering Improves Machine Learning Assignment of Molecular Endotypes in the Intensive Care Unit

Sines, B. J.; Hagan, R. S.; Jiang, X.; Pavlechko, E.; McClain, S.; Hunt, X.; Florou-Moreno, J.; Acquardo, J.; Risa, G.; Valsaraj, V.; Schisler, J. C.; Wolfgang, M. C.

2026-04-07 intensive care and critical care medicine 10.64898/2026.04.06.26350248 medRxiv

Top 0.1%

12.3%

Show abstract

Objective: To develop a workflow that transforms electronic health record data into machine learning-ready features for molecular endotype assignment and to evaluate whether clinician-informed feature engineering improves model performance and interpretability. Materials and Methods: We developed parallel clinician-informed and clinician-agnostic feature engineering pipelines to prepare raw EHR data from mechanically ventilated patients with respiratory failure. Molecular endotype labels derived from paired deep lung and blood profiling of subjects with acute lung injury were used to train candidate machine learning classifiers. Champion models from each pipeline were compared on predefined performance metrics. Results: Bayesian network classifiers were the top-performing models in both pipelines. The clinician-informed pipeline generated fewer features than the clinician-agnostic pipeline (645 vs 1,127) and produced a lower misclassification rate in the final Bayesian network model (0.047 vs 0.14). In an independent cohort of subjects with acute lung injury, the clinician-informed model better distinguished corticosteroid-responsive from non-responsive subgroups. Discussion: Clinical context improved feature engineering efficiency, model interpretability, and classification performance. These findings support the integration of domain expertise into machine learning workflows intended for critical care implementation. Conclusions: Clinician-informed feature engineering can simplify machine learning models while improving performance and preserving clinical relevance. AI tools developed for healthcare should incorporate subject matter expertise early in the feature engineering and analytic workflow.

8

Pre-illness Clonal Hematopoiesis of Indeterminate Potential is an Independent Predictor of Morbidity and Mortality in Sepsis

Berg, N. K.; Kerchberger, V. E.; Pershad, Y.; Corty, R. W.; Bick, A. G.; Ware, L. B.

2026-04-15 intensive care and critical care medicine 10.64898/2026.04.14.26350864 medRxiv

Top 0.1%

10.1%

Show abstract

Rationale: Sepsis is a life-threatening syndrome causing significant morbidity and mortality especially in the aging population. Clonal hematopoiesis of indeterminate potential (CHIP) is an age-related condition of clonal expansion of hematopoietic stem cells harboring somatic mutations associated with increased incidence of chronic illness and all-cause mortality. Objective: Evaluate the association of pre-illness CHIP with mortality and morbidity in patients admitted to the ICU with sepsis. Methods: We performed a retrospective study using a de-identified electronic health record linked with a DNA biorepository. We identified adult patients with sepsis who had DNA collected prior to ICU admission. We tested the association between CHIP status, determined from whole-genome sequencing, and ICU mortality, organ support-free days, and long-term survival adjusting for age, sex, race and Sequential Organ Failure Assessment (SOFA) score on ICU admission. Measurements and Main Results: Pre-illness CHIP was associated with increased sepsis mortality (OR = 1.54, 95% CI 1.13 to 2.07, P = 0.005) and fewer days alive and free of organ support (-1.7 days, 95% CI -3.2 to -0.2, P = 0.028) after adjusting for age, sex, race, and SOFA score. In sepsis survivors, CHIP was also associated with increased long-term mortality after discharge (HR 1.40, 95% CI 1.01 to 1.93, P = 0.041). Conclusions: Pre-illness CHIP was independently associated with increased mortality and morbidity in critically-ill adults with sepsis. These findings suggest that CHIP is a risk factor for sepsis severity. Elucidating the mechanism underlying this association could uncover new therapeutic interventions for sepsis.

9

Re-evaluation Of Hypo- And Hyperoxemia In Patients With Respiratory Failure And Veno-Venous Extracorporeal Membrane Oxygenation

Buenger, V.; Russ, M.; Hunsicker, O.; La Via, L.; Menk, M.; Kuebler, W.; Weber-Carstens, S.; Graw, J.

2026-04-07 intensive care and critical care medicine 10.64898/2026.04.01.26349732 medRxiv

Top 0.1%

6.9%

Show abstract

Background: Many patients in the ICU receive oxygen to secure blood and tissue oxygenation. Increasing evidence shows exposure to high fractions of inhaled oxygen (FiO2) being associated with adverse effects. In patients with severe ARDS, veno-venous Extracorporeal Membrane Oxygenation (VV-ECMO) can be implemented as a rescue therapy and PaO2 levels can be controlled by the blood flow of the VV-ECMO. Yet, optimal oxygenation targets in ARDS patients treated with VV-ECMO are unknown. Methods: Retrospective analysis of 443 patients with severe ARDS treated with VV-ECMO. Regression analyses were performed for mortality and time-weighted averages of PaO2 and FiO2. Furthermore, considering a possible non-linear relationship, a restricted cubic spline (RCS) model was performed for PaO2. Results: A simple logistic regression for mean PaO2 and ICU mortality showed a significant positive association (per mmHg OR 0.99 [95%CI 0.98-1.00], p=0.002). RCS analysis showed a U-shaped association of mortality and mean paO2 (paO2 69.70-90.24mmHg: OR 0.92 [95%CI 0.89-0.94], p<0.001; paO2 90.24-123.40mmHg: OR 1.09 [95%CI 1.06-1.13], p<0.001). A model including PaO2 as RCS variable and FiO2 showed significant associations of mortality with both variables (PaO2 69.70-90.24mmHg: OR 0.94 [95%CI 0.91-0.97], p<0.001; paO2 90.24-123.40 mmHg: OR 1.07 [95%CI 1.04-1.11], p<0.001; FiO2: OR 35.98 [95%CI 8.67-158.60], p<0.001, VIF<1.11). Conclusions: PaO2-levels in patients with ARDS and VV-ECMO have a U-shaped association with mortality. Optimal outcomes are observed in the 90-123 mmHg range, which is higher compared to non-ECMO settings. Whether this is explainable by increased tissue oxygenation with concurrent avoidance of pulmonary hypoxia should be subject of future research.

10

Observation-process features are associated with larger domain shift in sepsis mortality prediction: a cross-database evaluation using MIMIC-IV and eICU-CRD

Yamamoto, R.; Wu, F.; Sprehe, L. K.; Abeer, A.; Celi, L. A.; Tohyama, T.

2026-04-06 intensive care and critical care medicine 10.64898/2026.04.05.26350209 medRxiv

Top 0.1%

6.3%

Show abstract

Clinical prediction models for sepsis frequently degrade when applied outside the development setting. Electronic health record data encode not only patient physiology but also observation processes such as measurement timing and frequency, which may be predictive within a site but unstable across sites. The contribution of these observation-process features to cross-site performance degradation has not been quantified. In this retrospective cohort study, we developed models for in-hospital mortality in adult intensive care unit (ICU) patients meeting Sepsis-3 criteria using Medical Information Mart for Intensive Care IV (MIMIC-IV) (n = 30,218; 16.3% mortality) and externally validated them in eICU Collaborative Research Database (eICU-CRD) (n = 31,403; 13.9% mortality). We compared seven prespecified model specifications representing physiologic summary strategies (a single aggregate severity score, most recent values, extreme values, and within-window variability), each evaluated with and without measurement counts as observation-process features. Models were fit using logistic regression and gradient-boosted trees. Internally, discrimination improved with more detailed physiologic summaries and measurement counts (logistic regression area under the receiver operating characteristic curve [AUROC] from 0.819 to 0.834). In external validation, performance drops were larger for specifications using more complex physiologic representations. Adding measurement counts was associated with larger domain shift (AUROC change, -0.047 versus -0.082 with counts in logistic regression). External calibration deteriorated progressively, with calibration slopes decreasing from 1.007 for the simplest model to 0.417 for the most complex specification in logistic regression. Gradient-boosted trees showed smaller incremental degradation from measurement counts but still exhibited domain shift in complex specifications. Inclusion of observation-process features in sepsis mortality prediction models was associated with improved internal discrimination but worse external calibration and transportability. These findings highlight that feature engineering decisions involve a tradeoff between internal performance and external generalizability, and that calibration assessment provides the most sensitive indicator of reduced transportability.

11

Perioperative Mortality Prediction Using a Bayesian Ensemble with Prevalence-Adaptive Gating

Pandey, A. K.

2026-04-06 health informatics 10.64898/2026.04.03.26350114 medRxiv

Top 0.1%

4.4%

Show abstract

Background: Perioperative mortality prediction in resource-limited surgical settings remains challenging due to class imbalance, missing data, and the heterogeneity of postoperative complications. Existing risk scores such as POSSUM depend on intraoperative variables and do not quantify prediction uncertainty. Methods: We developed a prevalence-adaptive Bayesian ensemble comprising three stochastic models: classifier Variational Autoencoder (VAE, AUC=0.95), a Flipout Last Layer network (AUC=0.84), and a Monte Carlo Dropout network (AUC=0.80), trained on 697 patients (39 deaths, prevalence 5.59%) with 67 preoperative and postoperative features. Class imbalance (16.9:1) was addressed through Variational Autoencoder augmentation: two class-conditional generative VAEs produced 619 synthetic survivor and 619 synthetic death records, yielding a balanced training corpus of 1,935 samples. VAE augmentation was selected over SMOTE and random oversampling after a comparative study (F1: random oversampling 0.61 vs VAE augmentation 0.77). Validation used a held-out set of 233 patients (13 deaths, 220 survivors). A six-stage prediction pipeline incorporated weighted base risk, a three-path prevalence-adaptive gate, Shannon entropy uncertainty quantification, and rank-transform calibration. Sensitivity analysis was conducted across all six empirically derived hyperparameters. A whole-cohort death audit evaluated all 52 deaths from the complete 930-patient dataset through the deployed clinical decision support system. Statistical analysis included Kruskal-Wallis testing of entropy across triage groups, Wilson score confidence intervals for performance metrics, and Spearman rank correlation for LIME-SHAP interpretability concordance. Results: On the validation cohort the ensemble achieved complete separation (sensitivity 100%, specificity 100%, Youden J=1.000; TP=13, FP=0, TN=220, FN=0). The whole-cohort death audit identified 36 of 52 deaths (sensitivity 69.2%, 95% CI 55.7%-80.1%; precision 100%, 95% CI 90.4%-100.0%; F1=0.818, bootstrap 95% CI 0.732-0.894). Shannon entropy differed significantly across triage levels (Kruskal-Wallis H(2)=24.212, p<0.001, {epsilon}2=0.453), confirming a monotone gradient SAFE < CRITICAL < GRAY ZONE. All six hyperparameters were invariant across their tested ranges (J=1.000 throughout; Supplementary Tables S1-S2). LIME and SHAP rankings showed statistically significant concordance (Spearman {rho}=0.440, p=0.024; Kendall T=0.357, p=0.011), with 4 of 6 principal mortality determinants shared across both methods. Conclusions: A prevalence-adaptive Bayesian ensemble with entropy-based uncertainty triage achieves zero false positive alerts and clinically meaningful audit sensitivity in perioperative mortality prediction. Complete hyperparameter invariance confirms that reported performance reflects structural properties of the calibration architecture. The 16 missed deaths represent feature-invisible cases beyond current observational feature capacity.

12

Classification of Recurrence Status After Surgical Treatment of Chronic Subdural Hemorrhage - A Machine Learning Approach

Hamou, H.; Kernbach, J.; Ridwan, H.; Fay-Rodrian, K.; Clusmann, H.; Hoellig, A.; Veldeman, M.

2026-03-27 neurology 10.64898/2026.03.25.26349323 medRxiv

Top 0.1%

3.7%

Show abstract

Background Chronic subdural hematoma (cSDH) recurrence requiring reoperation occurs in 5-33% of cases, representing a substantial clinical and economic burden. The ability to predict recurrence could enable risk-stratified surveillance protocols, potentially reducing imaging burden in low-risk patients while maintaining close monitoring for high-risk individuals. We evaluated whether machine learning algorithms could achieve clinically actionable recurrence prediction using routinely available clinical and radiographic variables. Methods This retrospective single-center study included 564 consecutive patients who underwent surgical evacuation of cSDH between 2015 and 2023. Data were randomly divided into training (75%, n=422) and test (25%, n=142) sets. We developed and compared three machine learning models--regularized logistic regression, Random Forest, and XGBoost--using 31 predictor variables including demographics, comorbidities, medications, laboratory values, hematoma characteristics, and postoperative features. Model development and hyperparameter tuning were performed exclusively on the training set using 10-fold cross-validation. The best-performing model was selected and evaluated on the held-out test set. The primary outcome was postoperative recurrence requiring reoperation. Results Postoperative recurrence occurred in 170 patients (30.1%). Within the training set, XGBoost achieved the highest cross-validated ROC AUC of 0.713 (SE=0.024), outperforming regularized logistic regression (0.686) and matching Random Forest (0.713). Variable importance analysis identified hematoma volume, coagulation parameters (INR, platelets, aPTT), and disease severity markers (ICU admission, GCS) as the most influential predictors, though absolute effect sizes remained modest. On the held-out test set, the final XGBoost model achieved ROC AUC 0.688 (95% CI: 0.590-0.772) with excellent calibration. However, at the clinically relevant 90% sensitivity threshold, test set specificity was only 30.3%, allowing potential imaging reduction in approximately one-third of non-recurrence patients. The consistency between training and test performance confirmed that limitations stem from inherent predictor information content rather than overfitting. Conclusions Machine learning models using routinely available clinical and radiographic variables cannot achieve clinically actionable risk stratification for cSDH recurrence. Despite rigorous methodology and internal validation, discriminative capacity remained insufficient to identify a low-risk patient subgroup suitable for de-escalated surveillance. These findings suggest that recurrence is driven by factors not captured in standard clinical assessment, and support either uniform surveillance protocols or symptom-driven imaging strategies rather than risk-stratified approaches.

13

The Visual Hemofilter: a novel visualization technology that improves task performance among intensive care professionals: A prospective simulation study.

Bider-Lunkiewicz, J.; Gasciauskaite, G.; Rück Perez, B.; Braun, J.; Willms, J.; Szekessy, H.; Nöthiger, C.; Hoffmann, M.; Milovanovic, P.; Keller, E.; Tscholl, D. W.

2026-04-20 intensive care and critical care medicine 10.64898/2026.04.16.26351012 medRxiv

Top 0.1%

3.6%

Show abstract

PurposeThis study evaluates the Visual Hemofilter, a novel decision-support and information transfer tool designed to assist with regional citrate anticoagulation (RCA) in hemofiltration. By representing hemofilter parameters and patient blood constituents as animated icons, the tool aims to improve clinicians interpretation of blood gas results and RCA reference tables. We hypothesized that the Visual Hemofilter would enhance clinical decision-making by enabling faster and more accurate therapy adjustments, increasing clinicians confidence in their decisions, and reducing cognitive workload compared to conventional methods. MethodsWe conducted a prospective, randomized, computer-based simulation study across four intensive care units at the University Hospital Zurich. Twenty-six critical care professionals participated, each managing regional citrate anticoagulation (RCA) scenarios using either the Visual Hemofilter or conventional methods involving blood gas analysis and reference tables. Following each scenario, participants made therapy adjustments and rated their decision confidence and cognitive workload. ResultsUse of the Visual Hemofilter significantly improved decision accuracy (odds ratio [OR] 3.96; 95% CI 2.03-7.73; p < 0.0001) and reduced decision time by an average of 33 seconds (mean difference -33.3 seconds; 95% CI -39.4 to -27.2; p < 0.0001). Participants also reported greater confidence in their decisions (OR 5.41; 95% CI 2.49-11.77; p < 0.0001) and experienced lower cognitive workload (mean difference -15.05 points on the NASA-TLX scale (National Aeronautics and Space Administration-Task Load Index); 95% CI -18.99 to -11.13; p < 0.0001). ConclusionsThe Visual Hemofilter enhances clinical decision-making in RCA by increasing accuracy and speed, boosting decision confidence, and reducing cognitive workload. This technology has the potential to reduce errors and better support critical care professionals in managing complex treatment scenarios.

14

Improving Care by FAster risk-STratification through use of high sensitivity point-of-care troponin in patients presenting with possible acute coronary syndrome in the EmeRgency department (ICare-FASTER): a stepped-wedge cluster randomized trial

Than, M.; Pickering, J. W.; Joyce, L. R.; Buchan, V. A.; Florkowski, C. M.; Mills, N. L.; Hamill, L.; Prystowsky, J.; Harger, S.; Reed, M.; Bayless, J.; Feberwee, A.; Attenburrow, T.; Norman, T.; Welfare, O.; Heiden, T.; Kavsak, P.; Jaffe, A. S.; apple, f.; Peacock, W. F.; Cullen, L.; Aldous, S.; Richards, A. M.; Lacey, C.; Troughton, R.; Frampton, C.; Body, R.; Mueller, C.; Lord, S. J.; George, P. M.; Devlin, G.

2026-04-23 cardiovascular medicine 10.64898/2026.04.21.26351433 medRxiv

Top 0.2%

2.5%

Show abstract

BACKGROUND Point-of-care (POC) high-sensitivity cardiac troponin (hs-cTn) testing has the potential to expedite decision-making and reduce emergency department (ED) length of stay for patients presenting with possible myocardial infarction (MI) by ensuring that results are consistently available when looked for by clinicians. We assessed the real-life effectiveness and safety of implementing POC hs-cTn testing in the ED. METHODS We conducted a pragmatic, stepped-wedge cluster randomized trial. The control arm was usual care with an accelerated diagnostic pathway utilizing a single-sample rule-out step with a central laboratory hs-cTn assay. The intervention arm used the same pathway with a POC hs-cTnI. The primary effectiveness outcome was ED length of stay assessed using a generalized linear mixed model, and the safety outcome was 30-day MI or cardiac death. RESULTS Six sites participated with 59,980 ED presentations (44,747 individuals, 61{+/-}19 years, 49.5% female) from February 2023 to January 2025, in which 31,392 presentations were during the intervention arm. After adjustment for co-variates associated with length of stay, the intervention reduced length of stay by 13% (95% confidence intervals [CI], 9 to 16%. P<0.001), corresponding to a reduction of 47 minutes (95%CI, 33 to 61 minutes) from a mean length of stay in the control arm of 376 minutes. The 30-day MI or cardiac death rate was similar in the control and intervention arms (0.39% and 0.39% respectively, P=0.54). CONCLUSIONS Implementation of whole-blood hs-cTnI testing at the POC into an accelerated diagnostic pathway was safe and reduced length of stay in the ED compared with laboratory testing.

15

Feasibility of Volumetric Analysis using Bedside Ultra-Low-Field Portable Magnetic Resonance Imaging in Patients receiving Extracorporeal Membrane Oxygenation

Stockbridge, M. D.; Faria, A. V.; Neal, V.; Diaz-Carr, I.; Soule, Z.; Ahmad, Y. B.; Khanduja, S.; Whitman, G.; Hillis, A. E.; Cho, S.-M.

2026-04-13 neurology 10.64898/2026.04.09.26350481 medRxiv

Top 0.2%

2.1%

Show abstract

The SAFE MRI ECMO (NCT05469139) study established the safety of ultra-low-field 64mT MRI in patients receiving extracorporeal membrane oxygenation (ECMO) in the setting of intensive care and demonstrated that these images were highly sensitive in detecting acquired brain injuries. This retrospective analysis of prospectively collected observational data sought to expand on these findings in light of the crucial need for neurological monitoring while patients receive ECMO by evaluating the feasibility of volumetric analyses derived from ultra-low-field MR images. T2-weighted scans from thirty patients who received ultra-low-field MRI while undergoing ECMO at Johns Hopkins Hospital were analyzed using a volumetric pipeline to determine whole brain volume and volumes of total grey matter, total white matter, subcortical grey matter, ventricles, left hemisphere, right hemisphere, telencephalon, left and right lateral ventricles, the total intracranial volume, and the cerebellum. Segmented brain volumes in patients undergoing ECMO were comparable to measurements obtained using conventional field and ultra-low-field MRI in the absence of ECMO instrumentation. The subgroup analysis demonstrated subtle volumetric differences between patients supported with venoarterial ECMO and those receiving venovenous ECMO. These data provide the first evidence that ultra-low-field MRI provides volumetric measurements comparable to conventional field-strength MRI, even in the presence of ECMO circuitry, supporting its feasibility for neuroimaging in critically ill patients.

16

Lung Ultrasound Feature Tracking to Quantify Regional Lung Strain in Mechanically Ventilated Pigs

Walters, R.; Allen, M. B.; Scheen, H.; Beam, C.; Waldrip, Z.; Singule-Kollisch, M.; Varisco, A.; Williams, J. G.; De Luca, D.; Varisco, B. M.

2026-04-20 respiratory medicine 10.64898/2026.04.16.26351053 medRxiv

Top 0.2%

1.9%

Show abstract

BackgroundIn patients requiring respiratory support, clinicians rely on physical exam, radiologic, laboratory, and ventilator-derived measures for the provision of sufficient support while minimizing ventilator and "work of breathing" induced lung injury. Point of care lung ultrasound (LUS) is a widely available tool in hospital and clinic environments. To date, LUS has not been used to evaluate lung strain. MethodsWe collected LUS images in four anesthetized, neuromuscularly blocked, and mechanically ventilated pigs being used for another experiment. A feature tracking tool was developed which tracked echo-bright lung structures in ten second clips obtained in triplicate of the right and left, upper and lower lung fields using tidal volumes of 4, 6, 8, 10, and 12 mL/kg. Pleural lines were manually drawn and a program for quantifying lung strain developed with assistance from Anthropic Claude Artificial Intelligence tool. Structures were identified in inspiratory and expiratory frames and tracked bidirectionally with median strain per frame used for calculations. ResultsTriplicate measures of lung ultrasound images in four pigs had a median coefficients of variation of 35% (23-47% IQR) and linear modeling of strain with tidal volumes of 4-12 mL/kg showed positive correlation with R2 value ranging from 0.89 to 0.97. Strain measurements were similar after bronchial administration of 1.5M hydrochloric acid. ConclusionsRegional lung strain quantification using LUS is a viable and potentially useful tool for respiratory support management.

17

Individualized Forecasting of Headache Attack Risk Using a Continuously Updating Model

Houle, T. T.; Lebowitz, A.; Chtay, I.; Patel, T.; McGeary, D. D.; Turner, D. P.

2026-04-22 neurology 10.64898/2026.04.20.26350119 medRxiv

Top 0.2%

1.8%

Show abstract

ImportanceMigraine attacks often occur unpredictably, limiting the ability of individuals to initiate timely preventive or preemptive treatment. Short-term probabilistic forecasting of migraine risk could enable more targeted management strategies. ObjectiveTo externally validate the previously developed Headache Prediction Model (HAPRED-I), evaluate an updated continuously learning model (HAPRED-II), and assess the feasibility and short-term safety of delivering individualized probabilistic migraine forecasts directly to patients. Design, Setting, and ParticipantsProspective 8-week cohort study conducted remotely at two academic medical centers in the United States (Massachusetts General Hospital and Wake Forest Health Sciences) between 2015 and 2019. Adults with recurrent migraine or tension-type headache completed twice-daily electronic diaries. A total of 230 participants contributed 23,335 diary entries across 11,862 participant-days of observation. Main Outcomes and MeasuresOccurrence of a headache attack within 24 hours following each evening diary entry. Model performance was evaluated using discrimination (area under the receiver operating characteristic curve [AUC]) and calibration. ResultsExternal validation of HAPRED-I demonstrated modest discrimination (AUC, 0.59; 95% CI, 0.57-0.61) and poor calibration, with predicted probabilities consistently exceeding observed headache risk. In contrast, the continuously updating HAPRED-II model demonstrated progressive improvement in predictive performance as participant-specific data accumulated. Discrimination increased from an AUC of 0.59 (95% CI, 0.57-0.61) during the first 14 days to 0.66 (95% CI, 0.63-0.70) after the first month, accompanied by improved calibration across predicted risk levels. Over the study period, 6999 individualized forecasts were delivered directly to participants. No evidence suggested that receipt of forecasts was associated with increasing headache frequency or worsening predicted headache risk trajectories. Conclusions and RelevanceA static migraine forecasting model demonstrated limited transportability to new individuals. In contrast, models that continuously update within individuals may improve predictive accuracy over time and enable real-time delivery of personalized migraine risk forecasts. Further work incorporating richer physiologic and contextual predictors will likely be necessary before such systems can reliably guide clinical treatment decisions.

18

Risk factors, outcomes, and predictors of therapeutic response in preterm infants with patent ductus arteriosus: A retrospective cohort study

Hamida, H. B.; El Ouaer, M.; Abdelmoula, S.; El Ghali, M.; Bizid, M.; Chamtouri, I.; Monastiri, K.

2026-04-17 pediatrics 10.64898/2026.04.10.26350668 medRxiv

Top 0.2%

1.7%

Show abstract

BackgroundPatent ductus arteriosus (PDA) is a common and potentially serious cardiovascular condition in preterm infants, particularly those with low gestational age and birth weight. Its management remains controversial due to variability in screening, diagnostic criteria, and treatment strategies. This study aimed to evaluate risk factors, outcomes, and management strategies for PDA in preterm infants, and to identify predictors of clinical and echocardiographic response to therapy. MethodsWe conducted a retrospective cohort study over a 4-year period (2016-2019) in the neonatal intensive care unit (NICU) of a tertiary care center. All consecutive preterm infants admitted during the study period were eligible. Infants with echocardiographically confirmed PDA who received pharmacological treatment with intravenous paracetamol or ibuprofen were included in the analysis. Missing data were minimal and handled using available-case analysis. Statistical analyses included descriptive statistics, Pearsons chi-square test, and multivariable logistic regression. ResultsAmong 2154 preterm infants admitted to the NICU, 60 were diagnosed with PDA (incidence : 2.8%). The mean gestational age was 29 {+/-} 2.6 weeks, and the median birth weight was 1200 g. Respiratory distress occurred in 95% of cases, mainly due to hyaline membrane disease (86.7%). PDA was symptomatic in 80% of infants. First-line treatment resulted in clinical improvement in 77% and ductal closure in 83.3% of cases, most within 3 days. Predictors of successful closure included gestational age [≥] 28 weeks (OR = 5.9; 95% CI : 1.7-20.2) and antenatal corticosteroid exposure (OR = 1.2; 95% CI : 1.0-1.6). Overall mortality was 35% and was significantly higher in infants < 28 weeks (OR = 5.0; 95% CI : 2.4-10.3). Clinical improvement (OR = 3.7) and echocardiographic closure (OR = 4.5) after first-line treatment were associated with reduced mortality. ConclusionsPDA in preterm infants is associated with substantial morbidity and mortality, particularly in those born before 28 weeks of gestation. Early diagnosis, antenatal corticosteroid exposure, and timely pharmacological treatment may improve outcomes. Systematic echocardiographic screening in high-risk neonates should be considered.

19

SMART-HF: Structured Management Approach to Remote Treatment of Heart Failure Associated With Predictable Hemodynamic Improvements In A Community Remote Pulmonary Artery Pressure Monitoring Program

Atzenhoefer, M.; Nelson, B.; Atzenhoefer, T. E.; Staudacher, M.; Boxwala, H.; Iqbal, F. M.

2026-04-16 cardiovascular medicine 10.64898/2026.04.12.26350637 medRxiv

Top 0.2%

1.7%

Show abstract

Aims: Responses to remote pulmonary artery pressure data vary across programs. We evaluated SMART-HF, a structured pulmonary artery diastolic pressure (PAD)-guided workflow, in a community heart failure cohort. Methods: We retrospectively analysed adults with heart failure and an implanted pulmonary artery pressure sensor managed with SMART-HF. Pulmonary artery diastolic pressure (PAD) was calculated from prespecified 14-day windows at baseline, 90 days, and 6 months. Two hemodynamic management performance indices (HMPI) were prespecified: the 6-Month Delta HMPI (PAD reduction >2 mmHg from baseline) and the 90-Day Target HMPI (PAD [≤]20 mmHg at 90 days). Exploratory analyses evaluated patients with baseline PAD >20 mmHg. Results: Of 37 patients, 36 had paired 90-day and 29 had paired 6-month windows. Mean PAD decreased from 18.3 +/- 7.0 to 16.1 +/- 6.3 mmHg at 90 days and from 18.8 +/- 6.8 to 15.5 +/- 5.8 mmHg at 6 months (both P < 0.001). The 90-Day Target HMPI was achieved in 26/36 (72.2%) and the 6-Month Delta HMPI in 19/29 (65.5%) [95% CI 45.7-82.1]. In the exploratory subgroup (baseline PAD >20 mmHg), mean PAD changes were -2.9 +/- 3.6 mmHg at 90 days (n = 19; P = 0.002) and -4.9 +/- 4.9 mmHg at 6 months (n = 15; P = 0.002). Conclusions: SMART-HF was associated with improved ambulatory pulmonary artery diastolic pressure control at 90 days and 6 months. Exploratory subgroup findings support further evaluation in patients with elevated baseline pulmonary artery diastolic pressure.

20

Most Instability Phases Resolve: Empirical Evidence for Trajectory Plasticity in Multimorbidity Care from Longitudinal Relational Monitoring

Martin, C. M.; henderson, i.; Campbell, D.; Stockman, K.

2026-04-24 health informatics 10.64898/2026.04.22.26351537 medRxiv

Top 0.2%

1.6%

Show abstract

Background: The instability-plasticity framework proposes that multimorbidity trajectories periodically enter instability phases that are vulnerable to escalation but also potentially modifiable through relational intervention. Whether such phases commonly resolve without acute care, or predominantly progress to hospitalisation, has not been quantified at scale. Objective: To quantify instability window outcomes across a longitudinal monitoring cohort; to test whether the characteristics distinguishing admitted from resolved windows reflect within-patient trajectory dynamics or between-patient severity; and to characterise which patient-reported and operator-rated signals reliably precede admission, using both a curated pilot sub-cohort and the full monitoring cohort with an explicit cross-cohort comparison. Methods: Two complementary analyses were conducted on data from the MonashWatch Patient Journey Record (PaJR) relational telehealth system. Instability windows were identified algorithmically (>=2 consecutive calls with Total_Alerts >=3) across the full longitudinal dataset (16,383 calls, 244 patients, 2.5 years) and classified by linkage to ED and hospital admission data. Window characteristics were compared at window, patient, and paired within-patient levels. Pre-admission signal cascades were analysed in two configurations: a curated pilot sub-cohort (64 patients, 280 calls, +/-10-day window, 103 admissions, December 2016-September 2017) and the full monitoring cohort (175 patients, 1,180 pre-admission calls, +/-14-day window, December 2016-July 2019). A three-way cross-cohort comparison decomposed differences between the two configurations into pipeline and population effects. Results: 621 instability windows were identified across 157 patients (64% of the monitored cohort). 67.3% resolved without hospital admission or ED attendance, a rate stable across alert thresholds 1-5. In paired within-patient analysis (n = 70), duration in days (p = 0.002) and multi-domain breadth (p < 0.001) distinguished admitted from resolved windows; alert intensity did not. In the pilot sub-cohort, patient-reported illness prognosis (Q21) was the dominant pre-admission signal (GEE beta = +0.058, AUC = 0.647, p-BH = 0.018). This finding did not replicate in the full cohort: Q21 was non-significant (GEE beta = -0.008, p = 0.154, AUC = 0.507). Cross-cohort analysis identified selective curation of the pilot sub-cohort as the primary explanation. In the full cohort, six signals escalated significantly before admission after Benjamini-Hochberg correction: total alerts, health impairment (Q26), red alerts, self-rated health (Q3), patient concerns (Q1), and operator concern (Q34). Health impairment achieved the highest individual AUC (0.605) and showed the longest pre-admission lead. No individual signal exceeded AUC 0.61. Conclusions: Two thirds of instability phases resolve without hospitalisation, providing direct empirical support for trajectory plasticity as a clinically frequent phenomenon. Within the same patient, persistence - in duration and in the consistency of high-severity multi-domain flagging across calls - distinguishes trajectories that tip into admission from those that resolve. The Q21 signal reversal between cohorts illustrates how selective curation can produce compelling but non-replicable findings in monitoring research. In the full population, objective alert signals and operator judgement, rather than patient illness prognosis, carry the pre-admission signal